A new benchmark dataset with production methodology for short text semantic similarity algorithms
نویسندگان
چکیده
منابع مشابه
Benchmarking short text semantic similarity
Short Text Semantic Similarity measurement is a new and rapidly growing field of research. “Short texts” are typically sentence length but are not required to be grammatically correct. There is great potential for applying these measures in fields such as Information Retrieval, Dialogue Management and Question Answering. A dataset of 65 sentence pairs, with similarity ratings, produced in 2006 ...
متن کاملText-to-Text Semantic Similarity for Automatic Short Answer Grading
In this paper, we explore unsupervised techniques for the task of automatic short answer grading. We compare a number of knowledge-based and corpus-based measures of text similarity, evaluate the effect of domain and size on the corpus-based measures, and also introduce a novel technique to improve the performance of the system by integrating automatic feedback from the student answers. Overall...
متن کاملA Comparative Study of Two Short Text Semantic Similarity Measures
This paper describes a comparative study of STASIS and LSA. These measures of semantic similarity can be applied to short texts for use in Conversational Agents (CAs). CAs are computer programs that interact with humans through natural language dialogue. Business organizations have spent large sums of money in recent years developing them for online customer selfservice, but achievements have b...
متن کاملCzech Dataset for Semantic Similarity and Relatedness
This paper introduces a Czech dataset for semantic similarity and semantic relatedness. The dataset contains word pairs with hand annotated scores that indicate the semantic similarity and semantic relatedness of the words. The dataset contains 953 word pairs compiled from 9 different sources. It contains words and their contexts taken from real text corpora including extra examples when the wo...
متن کاملECNUCS: Measuring Short Text Semantic Equivalence Using Multiple Similarity Measurements
This paper reports our submissions to the Semantic Textual Similarity (STS) task in ∗SEM Shared Task 2013. We submitted three Support Vector Regression (SVR) systems in core task, using 6 types of similarity measures, i.e., string similarity, number similarity, knowledge-based similarity, corpus-based similarity, syntactic dependency similarity and machine translation similarity. Our third syst...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Speech and Language Processing
سال: 2013
ISSN: 1550-4875,1550-4883
DOI: 10.1145/2537046